Integrating planning for task-completion dialogue policy learning
نویسندگان
چکیده
Training a task-completion dialogue agent with real users via reinforcement learning (RL) could be prohibitively expensive, because it requires many interactions with users. One alternative is to resort to a user simulator, while the discrepancy of between simulated and real users makes the learned policy unreliable in practice. This paper addresses these challenges by integrating planning into the dialogue policy learning based on Dyna-Q framework, and provides a more sample-efficient approach to learn the dialogue polices. The proposed agent consists of a planner trained on-line with limited real user experience that can generate large amounts of simulated experience to supplement with limited real user experience, and a policy model trained on these hybrid experiences. The effectiveness of our approach is validated on a movie-booking task in both a simulation setting and a human-in-theloop setting.
منابع مشابه
Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning
Building a dialogue agent to fulfill complex tasks, such as travel planning, is challenging because the agent has to learn to collectively complete multiple subtasks. For example, the agent needs to reserve a hotel and book a flight so that there leaves enough time for commute between arrival and hotel check-in. This paper addresses this challenge by formulating the task in the mathematical fra...
متن کاملComposite Task-Completion Dialogue System via Hierarchical Deep Reinforcement Learning
Building a dialogue agent to fulfill complex tasks, such as travel planning, is challenging because the agent has to learn to collectively complete multiple subtasks. For example, the agent needs to reserve a hotel and book a flight so that there leaves enough time for commute between arrival and hotel check-in. This paper addresses this challenge by formulating the task in the mathematical fra...
متن کاملThe Effects of Task Variation on the Accuracy and Complexity of Iranian EFL Learners’ Oral Performance
Task variation is an integrative method aiming at the importance of learner-to-learner interactions in a wide range of learning contexts and fostering authentic use of language and meaningful communication. This study investigated the impact of task variation on the accuracy and complexity of Iranian EFL learners’ oral speech. In so doing, 80 intermediate EFL learners, majoring English at the I...
متن کاملAdversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning
This paper presents a new method — adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in taskcompletion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the ...
متن کاملOptimising Turn-Taking Strategies With Reinforcement Learning
In this paper, reinforcement learning (RL) is used to learn an efficient turn-taking management model in a simulated slotfilling task with the objective of minimising the dialogue duration and maximising the completion task ratio. Turn-taking decisions are handled in a separate new module, the Scheduler. Unlike most dialogue systems, a dialogue turn is split into microturns and the Scheduler ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1801.06176 شماره
صفحات -
تاریخ انتشار 2018